Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
نویسندگان
چکیده
In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analysis of the semantic content of e-mails, due to their potentially higher generalisation capability with respect to manually derived classification rules used in current server-side filters. However, very recently spammers introduced a new trick consisting of embedding the spam message into attached images, which can make all current techniques based on the analysis of digital text in the subject and body fields of e-mails ineffective. In this paper we propose an approach to antispam filtering which exploits the text information embedded into images sent as attachments. Our approach is based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails. The effectiveness of the proposed approach is experimentally evaluated on two large corpora of spam e-mails.
منابع مشابه
Image Spam Filtering by Content Obscuring Detection
We address the problem of filtering image spam, a rapidly spreading kind of spam in which the text message is embedded into attached images to defeat spam filtering techniques based on the analysis of e-mail’s body text. We propose an approach based on low-level image processing techniques to detect one of the main characterstics of most image spam, namely the use of content obscuring technique...
متن کاملImage spam filtering using textual and visual information
In this paper we focus on the so-called image spam, which consists in embedding the spam message into images attached to e-mails to circumvent statistical techniques based on the analysis of body text of e-mails (like the “bayesian filters”), and in applying content obscuring techniques to such images to make them unreadable by standard OCR systems without compromising human readability. We arg...
متن کاملEmbedded-Text Detection and Its Application to Anti-Spam Filtering
Embedded-Text Detection and Its Application to Anti-Spam Filtering Ching-Tung Wu Embedded-text in images usually carry important messages about the content. In the past, several algorithms have been proposed to detect text boxes in video frames. Previous work often followed a multi-step framework using a combination of image-analysis and machine-learning techniques. In this work, we propose a u...
متن کاملA Sobel Edge Detection Algorithm Based System for Analyzing and Classifying Image Based Spam
Early spam mails were only text-based, however spammers have moved to more sophisticated spamming techniques that involve images now generally termed image based spam. In most image-based spam, the entire spam message, which could be sometimes text, is embedded in an image of any format. This type of spam emails creates another dimension to the spam filtering problem scenario. Extracting text f...
متن کاملImproving Image Spam Filtering Using Image Text Features
In this paper we consider the approach to image spam filtering based on using image classifiers aimed at discriminating between ham and spam images, previously proposed by other authors. In previous works this approach was implemented using “generic” image features. In this paper we show that its effectiveness can be improved by using specific features related to the graphical characteristics o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 6 شماره
صفحات -
تاریخ انتشار 2006